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VARIABLE METRIC INEXACT LINE SEARCH BASED METHODS FOR 
NONSMOOTH OPTIMIZATION * 

S. BONETTINI*, I. LORISt, F. PORTA*, AND M. PRATO* 


Abstract. We develop a new proximal-gradient method for minimizing the sum of a differentiable, possibly nonconvex, 
function plus a convex, possibly non differentiable, function. The key features of the proposed method are the definition of 
a suitable descent direction, based on the proximal operator associated to the convex part of the objective function, and 
an Armijo-like rule to determine the step size along this direction ensuring the sufficient decrease of the objective function. 
In this frame, we especially address the possibility of adopting a metric which may change at each iteration and an inexact 
computation of the proximal point defining the descent direction. For the more general nonconvex case, we prove that all 
limit points of the iterates sequence are stationary, while for convex objective functions we prove the convergence of the 
whole sequence to a minimizer, under the assumption that a minimizer exists. In the latter case, assuming also that the 
gradient of the smooth part of the objective function is Lipschitz, we also give a convergence rate estimate, showing the 
O(^) complexity with respect to the function values. We also discuss verifiable sufficient conditions for the inexact proximal 
point and we present the results of a numerical experience on a convex total variation based image restoration problem, 
showing that the proposed approach is competitive with another state-of-the-art method. 
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1. Introduction. In this paper we consider the problem 

min f(x) = f 0 {x) + /i(x) 

xGl n 


(l.i) 


where f± is a proper, convex, lower semicontinuous function and /o is smooth, i.e. continuously differen¬ 
tiable, on an open subset of R” containing dom(/i) = {x £ R n : fi{x) < +oo}. 

We also assume that /i is bounded from below and that dom(/i) is non-empty and closed. Formu¬ 
lation (1.1) includes also constrained problems over convex sets, which can be introduced by adding to 
/i the indicator function of the feasible set. 

When in particular /i reduces to the indicator function of a convex set fl, i.e. /i = to with 


lq{x) 


0 if x £ fl 

+oo if x ^ fl. ’ 


a simple and well studied algorithm for the solution of (1.1) is the gradient projection (GP) method, 
which is particularly appealing for large scale problems. In the last years, several variants of such method 
have been proposed [7, 10, 18, 21], with the aim to accelerate the convergence which, for the basic 
implementation, can be very slow. In particular, reliable acceleration techniques have been proposed 
for the so called gradient projection method with line-search along the feasible direction [6, Chapter 2], 
whose iteration consists in 


x (k+i) = x {k) + A (fc)( y (fc) _ a .(*)) > (1.2) 

where y^ is the Euclidean projection of the point x^ —\7 fo(x^) onto the feasible set Q and £ [0,1] 
is a steplength parameter ensuring the sufficient decrease of the objective function. Typically, \( k ' is 
determined by means of a backtracking loop until an Armijo-type inequality is satisfied. Variants of the 
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basic scheme are obtained by introducing a further variable stepsize parameter a*,, which controls the 
step along the gradient, in combination with a variable choice of the underlying metric. In practice, the 
point y can be defined as 

y {k) = argmin \7f 0 (x (k) ) T (y - x (fe) ) + -^—(y - x {k) ) T D k (y - z (fe) ) (1.3) 

yen zafc 

where ak is a positive parameter and Dk £ R" X7i is a symmetric positive definite matrix. The stepsizes 
afc and the matrices D have to be considered as “free'’ parameters of the method and a clever choice of 
them can lead to significant improvements in the practical convergence behaviour [7, 8, 10]. 

In this paper we generalize the GP scheme (1.2)—(1.3), by introducing the concept of descent direction 
for the case where f± is a general convex function and we propose a suitable variant of the Armijo rule 
for the nonsmooth problem (1.1). In particular, we focus on the case when the descent direction has the 
form y ( k ' 1 — x^ , with 

y {k) = arg min V/ 0 (ir (fe) ) T {y - x (k) ) + d aW {y,x (k) ) + fi{y) - fi(x {k) ), (1.4) 

where d a (k) (*, •) plays the role of a distance function, depending on the parameter a^ £ R 9 . Clearly, (1.4) 
is a generalization of (1.3), which is recovered when /i = in, by setting d a (y,x) = j^(y — x) T D(y — x), 
with a = (a, D). 

Formally, the scheme (1.2)-(1.4) is a forward-backward (or proximal gradient) method [15, 16] de¬ 
pending on the parameters A^, cr^. 

In particular, we deeply investigate the variant of the scheme (1.2)—(1.4) where the minimization 
problem in (1.4) is solved inexactly and we devise two types of admissible approximations. We show 
that both approximation types can be practically computed when fi(x) = g(Ax ), where A £ R mxn 
and g : R m —> R is a proper, convex, lower semicontinuous function with an easy-to-compute resolvent 
operator. In this case, our scheme consists in a double loop method, where the inner loop is provided 
by an implementable stopping criterion. For general /o, we are able to prove that any limit point of 
the sequence generated by our inexact scheme is stationary for problem (1.1). The proof of this fact is 
essentially based on the properties of the Armijo-type rule adopted for computing A ^ and it does not 
require any Lipschitz property of the gradient of /o- When fo is convex, we prove a stronger result, 
showing that the iterates converge to a minimizer of (1.1), if it exists. In the latter case, under the 
further assumption that V/o is Lipschitz continuous, we give a ( 9 ( 4 ) convergence rate estimate for the 
objective function values. Our analysis includes as special cases several state-of-the-art methods, as those 
in [7, 9, 10, 26, 32], 

Forward-backward algorithms based on a variable metric have been recently studied also in [14] for the 
convex case and in [13] for the nonconvex case under the Kurdyka-Lojasiewicz assumption (see also [20]). 
Even if our scheme is formally very similar to those in [13, 14], the involved parameters have a substantially 
different meaning. In our case, the theoretical convergence is ensured by the Armijo parameter A ^ in 
combination with the descent direction properties; this results in an almost complete freedom to choose 
the other algorithm parameters (e.g. ak and -Dfc), without necessarily relating them to the Lipschitz 
constant of V/o (actually, our analysis, except the convergence rate estimate, is performed without this 
assumption). We believe that this is also one of the main strength of our method, since acceleration 
techniques based on suitable choices of ak and Dk , originally proposed for smooth optimization, can be 
adopted, leading to an improvement of the practical performances. The other crucial ingredient of our 
method is the inexact computation of the minimizer in (1.4): this issue has been considered in several 
papers in the context of proximal and proximal gradient methods (see for example [1, 13, 31, 33] and 
references therein). The approach we follow in this paper is more similar to the one proposed in [33] 
and has the advantage to provide an implementable condition for the approximate computation of the 
proximal point. Moreover, we also generalize the ideas proposed in [7] for the inexact computation of the 
projection onto a convex set. Finally, we also mention the papers [2, 3, 4, 19] for the use of non Euclidean 
distances in the context of forward backward and proximal methods. 
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The paper is organized as follows: some background material is collected in Section 2, while the 
concept of descent direction for problem (1.1) is presented and developed in Section 3. In Section 4, the 
modified Armijo rule is discussed. Then, a general convergence result for line-search descent algorithms 
based on this rule is proved, in the nonconvex case. Two different inexactness criteria, called of e-type 
and 77 -type are proposed in Sections 4.2 and 4.3, and the related implementation is discussed in Sections 
5.1 and 5.4. Section 4.5 deals with the convex case, where the convergence of an e-approximation based 
algorithm is proved and the related convergence rate is analyzed. The results of a numerical experience 
on a total variation based image restoration problem are presented in Section 6 while our conclusions are 
given in Section 7. 

Notation. We denote the extended real numbers set as R = R U {— 00 , + 00 } and by R>o, R>o the 
set of non-negative and positive real numbers, respectively. The scaled Euclidean norm of an n-vector x, 
associated to a symmetric positive definite matrix D is ||x||d = Vx T Dx. Given p > 1, we denote by 
the set of all symmetric positive definite matrices with all eigenvalues contained in the interval [A, p\. 
For any D £ we have that D _1 also belongs to and 

- 7 IMI 2 < \\A\d < mIMI 2 (1.5) 


for any i£l". 

2. Definitions and basic properties. We recall the following definitions. 

Definition 2.1 [29, p.213] Let f be any function from R ra to R. The one sided directional derivative of 
f at x with respect to a vector d is defined as 


f'(x; d) = lim 
v ' a 1,0 


f{x + A d) - f{x) 
A 


( 2 . 1 ) 


if the limit on the right-hand side exists in R. 

When / is smooth at x, then f(x\d) = V/(x) T d. When / is convex, its directional derivative has the 
following property. 

Theorem 2.1 [29, Theorem 23.1] If f is convex and x £ dom(/), then for any d £ R" the limit at the 
right-hand side of (2.1) exists and f(x\d) = inf* >0 

As a consequence of the previous theorem, for any convex function f we have that ffx: d) exists for any 
x £ dom(/), deR" and 


f(x;d)<f(x + d)-f(x). (2.2) 

Definition 2.2 [32, p. 394] A point x is stationary for problem (1.1) if x £ dom(/) and 

f(x-,d)> 0 Vd £ R". (2.3) 

Definition 2.3 [20, §2.3] The proximity or resolvent operator associated to a convex function f : R™ —>■ 
R in the metric induced by a symmetric positive definite matrix D is defined as 

proxj (x) = arg nmr f(z) + — x\\ 2 D , Vx £ R". 

We remark that proxy? is a Lipschitz continuous function whose Lipschitz constant is ||D||. 

Definition 2.4 Let f : R n —> R be a convex function. The conjugate function of f is the function 
f* : R n — > R defined as f*(y) = sup^gg^ x T y — f(x) \/y £ R”. 
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The following proposition states a useful property of the conjugate. 


Proposition 2.1 Let f : R” — > R, g : R m —» R be two convex functions, A £ W mxn .If f{x) = g{Ax), 
then f*(A T y) < g*(y) My £ R m . 

Proof. By Definition 2.4 we have 

f*( AT y)= sup x T A T y-f{x) = sup {Ax) T y-g{Ax) = sup z T y-g{z)< sup z T y-g{z) = g*{y). 

xeK” i£l" z£K m ,z=Ax zSR m 

□ 


Definition 2.5 [35, p. 82] Given e £ R>o, the e-subdifferential of a convex function f : R' 1 —> R at a 
point z £ R n is the set 

d e f(z) = {w£M. n :f{x)>f(z) + {x-z) T w-e, Mx £ R”}. (2.4) 

If * £ dom(/), then d t f{z) ^ 0. For e = 0 the usual subdifferential set df{z) is recovered. A useful 
property of the e-subdifferential is the following one. 


Proposition 2.2 [35, Theorem 2.f.f (iv)[ Let f : R n ->Rfc« convex, proper, lower semicontinuous 
function. Then for any e £ R>o and for any x £ R” we have x* £ d e f(x) <t=> x £ d e f*(x*). 

3. A family of descent directions. When / is smooth, a vector d £ R" is said a descent direction 
for / at x when V f{x) T d < 0. In the nonsmooth case (1.1), we give the following definition, based on 
the directional derivative. 


Definition 3.1 A vector d £ R" is a descent direction for f at x £ dom(/) if f'(x] d) < 0. 

Thanks to Theorem 2.1, the previous definition is well posed. In this section we define a family of descent 
directions for problem (1.1). To this end, we define the following set of non-negative functions. 

Given a convex set D C R” and a set of parameters S C R 9 , we denote by T>(Cl,S) the set of any 
distance-like function d a : R n x R" —y R>o U {+oo} continuously depending on a £ S such that for all 
z, x £ LI we have: 

(2?i) d a (z,x) is continuous in (a,z,x); 

(' V 2 ) d a (z,x) is smooth w.r.t. z £ D; 

{Vs) dcr(z,x) is strongly convex w.r.t. z: 

da{z 2 ,x) > d a {z!,x) + \7 1 d a {z 1 ,x) T {z 2 - Zi) + y||z 2 - Zi|| 2 Mz 1 ,z 2 £ o, 

where m > 0 does not depend on a or x (here Vi denotes the gradient with respect to the first 
argument of a function); 

{V 4 ) d a (z,x) = 0 if and only A z = x (which implies that V \d a {x,x) = 0 for all x £ D). 

The scaled Euclidean distance 

d a (x,y) = ^-\\x-y\\ 2 D (3.1) 

with <7 = {a,D), where a > 0 and D £ R” xra is a symmetric positive definite matrix, is an interest¬ 
ing example of a function in V{W n ,S). Other examples of distance-like functions can be obtained by 
considering Bregman distances associated to a strongly convex function. 

For a given array of parameters a £ S C R 9 , let us introduce the function h a : R" x R” —> R defined 
as 


h a {z,x) = \7f Q {x) T {z - x) + d a {z,x) + fi{z) - fi{x) Mz,x £ R”, 


(3.2) 
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where d a £ 2?(fi, S) and = dom(/i). We remark that h a depends continuously on tr, as d a does. 
Moreover, since d a (-,x) and /i are convex, proper and lower semicontinuous, h a (-,x) is also convex, 
proper and lower semicontinuous for all x £ flo. Finally, for any point x £ and for any d £ R" we have 

h' a (x,x-,d) = f'(x-,d), (3.3) 

where h' a (z,x\d) denotes the directional derivative of h a ( ■ ,x) at the point z with respect to d. From 
assumption ( 2 ? 3 ), it follows that h a (-,x) is strongly convex and admits a unique minimum point for any 

x s n. 

Now we introduce the following operator p : LIq —> fl associated to any function h a of the form (3.2) 

p{x ; h a ) = arg min h a (z, x). (3-4) 

■z£]R n 

When d(j is chosen as in (3.1), the operator (3.4) becomes 

p(x-,h a ) = prox° fi (x - a-D _ 1 V/ 0 (x)). 

Under assumption (T> 3 ), one can show that p(x ; h a ) depends continuously on (x, a). 

Proposition 3.1 Let da £ V(Ll,S) and h a be defined as in (3.2). Then p(x;h a ) depends continuously 
on (x, a). 

Proof. Let y = argmin 26 Kn h„(z, x). Then y is characterized by the equation V/o(x)+Vig? ct (i/, x)+u; = 0, 
where w £ df\{y). It follows that fi{u) > fi(y) + w T (u — y) for all u £ R” or: 

fi{u) > fi(y) - (V/ 0 (x) + V 1 da{y,x)) T (u- y) Mu £ R". 

Assumption (Vfi) expressed in y and u gives: 

d a (u,x) > d a (y,x) + S7id a {y, x) T (u - y) + y \\y - u \\ 2 Mu £ R”. 

Together, these two inequalities yield: 

7 p||y- u \\ 2 < /:(u) - fi(y) +d a (u,x) - d a (y,x) + V/ 0 (x) T (u- y) Mu £ R". 

Let yi = p{x\\h ai ) and y 2 = p{x 2 \h a2 ). Adding the previous inequality for y = yi (resp. y = 2 / 2 ) and 
choosing u = 1/2 (resp. u = y\ ), one finds: 

m\\yi - y 2 \\ 2 < d ai {y 2 ,xi) - d ai {yi,xi) + d a 2 (yi,x 2 ) - d a 2 (y 2 ,x 2 ) + (V/ 0 (x 1 ) - V/ 0 (x 2 )) t (i / 2 - yi) 
and hence: 

m\\yi - y 2 \\ 2 < da 2 (yi,x 2 ) - d CTl (yi,xi) + d ai {y 2: x 1 ) - d a 2 {y 2 ,x 2 ) + ||V/ 0 (xi) - V/ 0 (x 2 )|| \\y 2 - yi\\. 

It follows that 0 < ||2/1 — y 2 \\ < (b+y/b 2 + Acm)/2m where b = || V/o(xi) —V/o(x 2 )|| and c = d a2 (yi , x 2 ) — 
d ai (2/1, Xi) + d ai (y 2 ,xi) — da 2 (y 2 , x 2 ). As /o is C 1 , one has lim a;2 _ >a;i 6 = 0. As d a (z,x) is continuous in 
(<r, z, x), one also has that lim a . 2 _>. Xl c = 0. This shows then that lim X2 _ ) . Xl \\y 2 — 2/1 1 | = 0, in other words 
p{x\',h ai ) is continuous in (<n,xi). □ 

Given a function d a £ 2?(U, S), we introduce also the function h a ,-y : R" x R ra -A R defined as 

6 CTi 7 ( 2 ,x) = Vf 0 {x) T (z - x) +jd a (z,x) + fi(z) - /i(x) Mz, x £ R" (3.5) 

for some 7 £ [0,1]. We have 

h an {y,x) < h a {y,x) Vx, y £ R n (3.6) 

and /i CT)7 = h a when 7 = 1 . In the following we will show that 
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• the stationarity condition (2.3) can be reformulated in terms of fixed points of the operator 

p( ■ ;M; 

• the negative sign of h a ^ detects a descent direction. 

To this purpose, we collect in the following proposition some properties of the function h a and the 
associated operator p( ■ ; h a ). 

Proposition 3.2 Let a g S C R 9 , 7 g [0,1], d a g V and h a , be defined as in (3.2), (3.5), where 
d a g T>(Ll, S). If x G LI and y = p(x\ h a ), then: 

(a) ha,~t(x, x ) = 0 ; 

(b) if z gR" and h an (z, x) < 0 , then f'(x ; z — x) < 0 ; 

(c) ha tl {y,x) < 0 and ha^(y,x) = 0 if and only if y = x; 

(d) f{x\y — x) < 0 and the equality holds if and only if h a ,-y(y,x) = 0 (if and only ifx — y). 

Proof, (a) is a direct consequence of definition (3.5) and condition ( T > 3 ) on d a . 

(b) If ha tl {z,x) < 0, we have 

0 > - 7 d„(z,x) > V f 0 (x) T (z - x) +fi(z) - fi(x) > Vf 0 {x) T {z-x) + f[(x-,z - x) = f'(x;z-x), 

where the second inequality follows from definition (3.5) of h an and the third one from (2.2). 

(c) Since y is the minimum point of h a ( ■ ,x), part (a) with 7 = 1 yields h a (y,x) < 0 which, in view 
of (3.6), gives h„ tl (y,x) <0. If y = x, part (a) implies h a ^{y,x) = 0. Conversely, assume h a ^(y,x) = 0. 
From inequality (3.6) we have h a (y,x) > 0. On the other side, since y is the minimum point of h a (-,x), 
part (a) with 7 = 1 implies h a (y,x) < 0. Thus h a (y,x) = 0 and since y is the unique minimizer of 
h a {-,x), we can conclude that x = y. 

(d) From (c) we have ha,~/{y,x) < 0. When h an (y,x) < 0 then part (b) implies f'(x;y — x) < 0. 
When ha,-y{y,x) = 0, from (c) we obtain y = x and, therefore, f(x\y — x) = 0. Conversely, assume 
fix ; y — x) = 0. This implies 

0 = V/ 0 (x) T (y- x) +f[(x;y- x) < Vf 0 (x) T (y - x) + fi{y) - fi(x) < h„„{y,x). 

Since h an (y,x) < 0, we necessarily have h a ^(y,x) =0. □ 

The following proposition completely characterizes the stationary points of (1.1) in two equivalent ways, 
as fixed points of the operator p(-; h a ), he. the solutions of the equation x = p{x\ h a ), or as roots of the 
composite function r a ^(x) = h a ^(p(x\ h a ), x). 

Proposition 3.3 Let S C R 9 , a g S, h a , h a , 7 be defined as in (3.2), 7 g [0,1], x g LI and y = p(x ; h a ). 
The following statements are equivalent: 

(a) x is stationary for problem ( 1 . 1 ); 

(b) x = y; 

(c) h a ^(y,x) = 0 . 

Proof, (a) (b) Assume that x = y. Then, h a (-,x) achieves its minimum at x and inequality (2.3) 

applied to it yields h' a (x, x; z — x) > 0 Vz g R". Recalling (3.3) we have h' a (x, x\ z — x) = f(x; z — x), 
hence x is a stationary point for problem ( 1 . 1 ). 

Conversely, let x g LI be a stationary point of (1.1) and assume by contradiction that x y. Then, 
by Proposition 3.2 (d) we obtain f'(x,y — x) < 0, which contradicts the stationarity assumption on x. 
(b) (c) See Proposition 3.2 (c). □ 

4. A line—search algorithm based on a modified Armijo rule. In this section we consider the 
modified Armijo rule described in Algorithm LS, which is a generalization of the one in [32], Indeed the 
rule proposed in [32] is recovered when d a is chosen as in (3.1) and 7 g [0,1). In the following we will 
prove that Algorithm LS is well defined and classical properties of the Armijo condition still hold for this 
modified case. 
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Algorithm LS Modified Armijo linesearch algorithm 

Let {x^}keNi {y^}ke N be two sequences of points in f2, and {ct^}/c£N be a sequence of parameters in 
S. Choose some 5,/3 £ (0,1), 7 £ [0,1]. For all k £ N compute A ^ as follows: 

1. Set = 1 and d W = y W - . 

2 ' lF /(x (fe) + A (fe) d (fe) ) < /( x (k) ) + /3A (fc) A (fe) (4.1) 

where 

A (4.2) 


Then go to step 3. 

Else set A^ fe ^ = SX ^ and go to step 2. 
3. End 


Here and in the following we will define the function h a (-, •) as in (3.2) and, for sake of simplicity, we 
will make the following assumption 

(HO) d a £ S), where H = dom(/i) and S C M 9 is a compact set. 

PROPOSITION 4.1 Let {x^}k^, {y^}keN be two sequences of points in H, {cr^ fe )}fcepj a sequence of 
parameters in S CI ? and 7 £ [0,1]. Assume that 

h a w,-y(y {k \ X W)<0 (4.3) 

for all k. Then, the line-search Algorithm LS is well defined, i.e. for each k £ N the loop at step 2 
terminates in a finite number of steps. If, in addition, we assume that an d {y^}ke n are 

bounded sequences and /(x^ fc+1 - ) ) < f(x^), then we have that A ^ = h a (k) 7 (y^ k \ x^) is bounded. 
Assuming also that 

lim f(x (fc) ) - f(x {k) + A< fc >d< fc >) = 0, (4.4) 

k—y 00 

where A ^ and d^ are computed with Algorithm LS, then we have 

lim haw 7 (y (/c) ,x (fe) ) = 0. 
k—yoo 

Proof. We prove first that the loop at step 2 of Algorithm LS terminates in a finite number of steps 
for any k £ N. Assume by contradiction that there exists a k £ N such that Algorithm LS performs an 
infinite number of reductions, thus, for any j £ N, we have 

0 A (fc) < /(xW+^dW)-/(xW) 

/o(x (fc) + VdW) - /o(#) /i(iW + <PdW) - h{xW) 

~ V + Si 

< .fo{x {k) + <5 J rf (fc) ) - /o(x( fc )) | + d <*>) + (1 - <P)/i(x w ) - h(x (*>) 

where the second inequality is obtained by means of the Jensen inequality applied to the convex function 
fi. Taking limits on the right hand side for j —> 00 we obtain 

/3AW < V/o(xW)V fc ) + h(y (k) ) - h(x (k) ) 

< V/o(xW) T rfW +fi& k) ) - /i(x (fc) ) + 'yd aW (y^ k \x^) 

= A<‘) < 0, 
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where the second inequality follows from the non-negativity of d a £ 7?(fl, S) and the last one from (4.3). 
Since 0 < j3 < 1, this is an absurdum. 

Assume now that {®^}keN) {y^}ken are bounded sequences and that f(x^ k+1 ^) < f(x^). We 
show that A ^ = h a <.k) 7 (y( k \ x^) is bounded. By assumption (4.3), h a m (y( k \ x^) is bounded from 
above. We show that it is also bounded from below. Indeed we have 

h aW ^ k \x^) = V/o(*W) r (j/ (fc) - x (fe) ) + 'yd aW (y^,x^) +A(jf (fc) ) - h(xW) 

> S7fo( x W) T (y {k) - * {k) ) + fi(y {k) ) - h(x {k) ) 

= V/o(jW) T (sW - * (fc) ) + h(y {k) ) - f(x {k) ) + fo(x {k) ) 

> V/ 0 (x^) T (yW - x^) + h (Sr (fc) ) ~ f(x {0) ) + /o(* (fc) ), 


where the first inequality follows from the non-negativity of d a , the second one is obtained by adding 
and subtracting fo{x^) and the last one is a consequence of f(x^ k+1 ^) < f{x( k ' > ). 

As /i is proper and convex, there exists a supporting hyperplane, i.e. 3a, b £ R ra such that fi{u) > 
a T u + b for all u £ R n . Thus: 

Kw,^ k \x^) > V/o(*W) t (£« - *W) + a T yW + b - f( x M) + f 0 ( x ( k )). 

The right hand side is a continuous function of x^ and y^ k \ As these are assumed to lie on a closed 
and bounded set, the left hand side is bounded (from below) as well. 

Let us show that the only limit point of is zero. We observe that from (4.3) and (4.4) we obtain 

0 = lim /(x (fe) ) - /( x {k) + A (fe) d (fe) ) = /? lim A (fc) A (fc) . (4.5) 

fe—»oo k—t oo 

Assume that there exists a subset of indices K C N such that linifc e i<' i fc _>. 00 A ^ = A £ R, with A < 0. 
By (4.5), this implies that 

lim A= 0. (4-6) 

k£K,k^> oo 

Denote by K C K a set of indices such that lim fce ^- k^-oo = &■> i? fc->oo x ^ = x and lim fce ^ oo 
y for some a £ S, x,y £ Q. From (4.6) we have that for any sufficiently large index k £ K, Algorithm 
LS makes at least a reduction: this means that 


/3(A( fc V<5)A (fe) < f{x {k) + (X^/6)d {k '>) - /(x (fc) ), 


for all sufficiently large k £ K. Repeating the same arguments employed in the first part of the proof, 
we obtain 


/3A (fe) < 


< 


/o(g w + (AW/f)d(*))-/o(gW) 

m/8 

fo(x^+ (\W/6)dW)-f 0 ( X ( k) ) 

m/s 


+ fi(y {k] ) - fm {k) ) 


+ fi(y {k) )-fi(xW) + 'yd a (y {k) ,x {k) )- 


Taking limits on both sides for k £ K, k —> oo, since = y^ — x^^keN is bounded and by (4.6) we 
obtain /3 A < A < 0, which is an absurdum, being 0 < /3 < 1. □ 


We prove also the following useful Lemma. 


Lemma 4.1 Let {z^jfceN; { y^}ke n be two sequences of points in fi, {cr^ fc ) }fc e pj a sequence of parameters 
in S C R 9 and 7 £ [0, 1] . Assume that 

/(a: (fe+1) ) < /(x (fc) + A (fc) d (fc) ), d (fe) = y {k) - x^ k) 


(4.7) 




VARIABLE METRIC INEXACT LINE-SEARCH BASED METHODS 


9 


where y^ satisfies (4.3) and X^ is computed by Algorithm LS for any k 6 N. Suppose that f is bounded 
from below. Then, we have 

OO 

0<-£ A (fc) ^ Wi7 (j/ (fc) , * (fc) ) < oo. (4.8) 

fc=o 

Proof. Denote by £ G R a lower bound for /, i.e. £ < f(x) \/x £ R ra . Inequalities (4.1) and (4.7) can be 
combined as 

-p\^K w ^ k \xW) < /(*<*>) - fix (fc+1) ). 

Summing the previous inequality for k = 0,..., j gives 

-pj2^ k) Kw,^ k \xW) < £(/(*(*>) - f(x ( fe+1 ))) = /(®<°>) - /(*W +1 )) < /(s<°>) - L (4.9) 

fc =0 fc =0 

Thus, inequality (4.8) follows. □ 


4.1. A class of line—search based algorithms. Proposition 4.1 allows the convergence analysis 
of a wide class of descent methods based on the Armijo condition (4.1). The crucial ingredients of these 
methods are 

• a descent direction d^ = y^ — x^ k \ where y^ is a suitable approximation of the point 
p(xW]h a )\ 

• the sufficient decrease of the objective function between two successive iterations, which has to 
amount at least to X^h^^iy^, x'®), where is determined by the backtracking procedure 
given in Algorithm LS. 

Theorem 4.1 Let {j/^HeN be two sequences of points in fl, {cr^jfcgpj C S and 7 £ [0,1]. 

Assume that there exists a limit point x of {a;( fc )}fceN and let K' C N be a subset of indices such that 
linifcg^fc-xxj x= x € fl. Assume that, for any k £ N we have 

f(x {k+1) ) < f[x {k) + A (fc) d (fc) ), d (k) = y (k) - x {k \ 

where X^ is computed by Algorithm LS, y^ satisfies (4.3) and there exists K" C K' such that 

lim h aW {y( k \x( k) ) - h a ( k) {y {k) tX^) = 0, with y {k) = p{x^ k) ]h a m)- (4.10) 

k£K" oo 

Then x is a stationary point for problem (1.1). 

Proof. First, we notice that Algorithm LS is well defined, since (4.3) holds. We observe that, since h a (k) 
is strongly convex with modulus of convexity m and y^ is its minimum point, we have 

y| \z~y ^\\ 2 <h aW (z,x^)-h aW (y^,x^) \/z £ 1R”. (4.11) 

Setting z = y^ in the previous inequality and using (4.10) gives 

lim ||y(*>-i,W||=0. (4.12) 

k£K" ,k —foo 

By continuity of the operator p(x- h a ), since {x^}keK' is bounded, {y^^kcK' is bounded as well. Thus, 
(4.12) implies that {y^}keK" is also bounded and there exists a limit point y of {y^fke N- We define 
K C K" such that limfcgx^^oo y^ = y and lim^g^fc^oo a ( fc ' = a. By continuity of the operator p{x\ h a ) 
with respect to all its arguments, (4.12) implies that y = p(x; h s ). 
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Consider now the sequence {f(x^)}ken- From assumption (4.7) it follows that 

/(x (fe+1) ) < /(x W + A (fc) d (fc) ) < /(x (fe) ). (4.13) 

Thus, the sequence {f(x^)}k^N is monotone nonincreasing and, therefore, it converges to some / € I. 
Since / is lower semicontinuous and x is a limit point of {x^jfcgN, we have 

/ = lim f(x^) = lim f{x < ^ k+1 ' > ) > /(x). 

k—> oo k—> oo 


The previous inequality implies that / el and this fact, together with inequality (4.13), gives 

lim /(x (fe) ) - /(x (fe) + A (fe) d (fc) ) = 0. 

k—too 


Thus we can apply Proposition 4.1 and obtain 

lim Lm i7 (y (fe) ,x (fc) ) = 0. 
k — >oo,keK 

Combining the previous equality with (3.6) and (4.10) yields 

0 = lim £,M, 7 (jr ( *\* W )< lim h aW {yM,xW)= lim h aW {y^ k \x^). 

k—>oo,k£K k—}oo,k£K k—too,k€K 


Since h a (k)(y^ k \x^) < 0, this implies lim/ £ _>. 00i fc e if h a &) (?/ fc ', x^) = 0. Expressing inequality (4.11) for 
z = x ^, we can write 


y||* (fc) - V W \\ 2 < ^)(*<‘1*W) ^ = -W» (fc, >* W ) 0. 


Thus, we proved that y = x, and, by Proposition 3.3 we have that x is stationary. 


□ 


Let us now discuss assumption (4.10) in the previous theorem, concerning the inexact solution of the 
minimum problem in (3.4). Assumption (4.3) guarantees that = y^ — x^ is a descent direction, 
which is needed for the line-search algorithm. However, it is not sufficient to ensure that the limit points 
are stationary, but we need also to assume that (4.10) holds. 

As counterexample, consider the case n = 1, /o(x) = x 2 /2, fi(x) = 0, d a {x,y) = (x — y) 2 / 2, 
f3 = 5 — 1/2. The sequence x^ k+1 ^ = x^ + A( fe )(jj( fe ) —x^) with A^ = 1, y^ = x^ — (l/2) fc+1 satisfies 
all the assumptions of Theorem 4.1 except (4.10). However, starting from x*- 0 ) = 2, the sequence writes 

as xl fc ) = 1 + (l/2) fc k ^° i ; while the only stationary point is 0. 

We remark that assumption (4.10) could be replaced by requiring that /i is continuous and (4.12) 
holds. Clearly, (4.10) cannot be checked directly, but it is very general. In the following sections, we will 
consider two implementable conditions which imply (4.10) and in Sections 5.1-5.4 we show how y^ can 
be computed in practice without knowing p(x^; h a (k)). 

4.2. e- approximations. In this section we will assume that d a has the form (3.1) and, in this case, 
we will describe a sufficient condition for (4.10). 

We observe that y = p(x; h a ) = prox^ (x — ad' 1 V/ 0 (x)) if and only if 0 € dh a (y , x), that is 

~D(z - y) G dfx(y), (4.14) 

a 

where z = x — ad' 1 V/o(x). Borrowing the ideas in [31, 33], we consider a relaxed version of (4.14) and 
we study the properties of any point y satisfying the following inclusion 

-D(z-y) S d e fi(y), (4.15) 

a 


where e € M>o- 
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Lemma 4.2 Let d a be defined as in (3.1) and x £ LI. Assume that y = p(x; h a ) and that y satisfies (4.15) 
for some e £ R>o- Then y € LI and we have 

(a) h^iy^x) - h a {y,x) < e; 

(b) ||y — y || 2 < ape, for all p £ R>o with A < A m i n (U) ; A m i n being the smallest eigenvalue of D. 

Proof. Since we have d e h a (y , x) D {AD(y — z) +w : w £ defi(y)} (see [35, Theorem 2.4.2 viii]) , inclusion 
(4.15) implies 0 £ d e h a (y,x) which, by definition (2.4) of e-subdifferential, is equivalent to 

h a (w, x) > h a (y, x) — e Vru £ R". (4-16) 

We recall that h a ( ■ ,x) is strongly convex with modulus m = 2/(ap) and y is its minimizer. This yields 

— \\y-y\\ 2 < h a {y,x) - K(y,x) < e, 
a/i 

where the rightmost inequality follows from (4.16) with w = y. □ 

The previous result combined with Theorem 4.1 directly implies the following Corollary. 

Corollary 4.1 Let 0 <c ermin — ^max? *r C [0,1] 7 fi ^ 1. Assuttic that [^min?^max]? 

{Dk}keN C {e fc }fceN C R>o, lim fc _).oo e fc = 0. Let {a: (fc) } fceN , {y (fe) } feeN be two sequences of points in 
f l such that, for any k £ N, (4.7) holds, where A^ is computed by Algorithm LS and y^ satisfies (4.3) 
and 


L Dk {zW-yW)£d tk h{yM), (4.17) 

ak 

with = x^ — afeH^ 1 V/o(x^ fe ^). Then, any limit point of the sequence is stationary for 

problem (1.1). 

4.3. 77 -approximations. A different approach to define a suitable approximation of the operator 
(3.4) is based on the following definition. 

P ri (x-,h a ) = {y £ LI : K(y,x) < ph a (y,x), where y= p(x; h a )} (4-18) 

for some 77 £ (0,1]. This idea of inexactness was introduced first in [7] to approximate the projection 
operator onto a convex set in the context of scaled gradient projection methods for smooth optimization. 
Clearly, if 


V ^ Tr]{x', h a f 


(4.19) 


then h a {y, x) < 0 and h a (y, x) = 0 if and only if h a (y , x) = 0 which implies y = y. 

The following Theorem establishes a convergence result under the condition y^ £ P v {xh a ). 


Theorem 4.2 Let 77 £ (0,1], 0 < 7 < 1, C S and C LI satisfying (4.7), where A ^ 

is computed by Algorithm LS, with 


^£P,(^;li ffW ). (4.20) 

Then, either for some k the iterate x^ is stationary for problem (1.1), or any limit point x of 
is stationary for problem (1.1). 

Proof. We set = p(x ( ' k ' > ; h a (k)) and we first observe that 7 < 1 and (4.20) imply 
h a (fc) l7 (y (fc) ,x (fe) ) < /i CT w(y (fe) ,x (fc) ) < 77/i CT ( fc) (7/ (fc) ,x (fc) ) < 0. 


(4.21) 
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If at some iterate k £ N we have h a (k) 7 (1/^,2^) = 0 and, as a consequence, h a ^0 = 0, then, 

by Proposition 3.3, x^ is a stationary point for problem (1.1). 

Otherwise h a (k) n (y^ k \ x^) < 0 for all k £ N and, thus, (4.3) holds. Consider now a limit point 
SeSlof {a;( fe )}fc g N (if one exists) such that \iva.k^oo,k&K' x^ k ’ = x for some set of indices K' C N. 

We first prove that {y^}keK' is bounded, using the strong convexity of h aW {-,x^). From (4.20) we 
have 


h a w(y {k) ,x {k) ) - h a{k) {y { - k \x (k) ) < (r]-l)h aW (y {k ' > ,x < ' k) ). (4.22) 

Since h^k) (•, x ^) is strongly convex with modulus of convexity to, and y^ is the minimizer of h a (k) (•, an fc '), 
we can write 


yll y (k) - y {k) II 2 < K w {yW,xW) - h aW (y( k \xM) < (r, - 1 )/!„<*> (y<*>, *<*>). 

Since y^ depends continuously on x^ k \ when {x^}keK' is bounded, and all lie in a closed set, then 
is also bounded. Recalling Proposition 4.1, we have that {h a w ^(y^, x^)}keK r is bounded 
from below; then, using inequalities (4.21), we can conclude that h (T (j t ){y^ k \x^ k ' > ) is also bounded from 
below for k £ K' and, thus, {y^}keK' is bounded. We define K C K' as the set of indices such that 
limfcg^fc-^oo a = (7, linifcg^^^+oo y^ = y for some a £ S, y £ Q. Thanks to the continuity of the 
operator (3.4), the set K is well defined, since the sequences {a '^}keK', {cr( fc '}fc g N are bounded, and, 
moreover, we have y = p(x\ h s )- Reasoning as in the proof of Theorem 4.1, the existence of a limit point 
guarantees that (4.4) is satisfied. Then, by Proposition 4.1, we obtain \iva.k^oo,keK h^ k ) (y( fe ),a;( fe )) = 0. 
Combining this with (4.20), we also have 


0 = lim h n ( k ) 

k—too,k£LK ’ 


r (H W ,x w )< lim h aW {y^,x^)<r] lim h aW (y^,x^) 

k—too,k£K k—too,k£K 


which, since h IJ ( k ){y^ k \x < ' k ' 1 ) < 0, implies 


lim h a{k) {y {k \x^) = 0 (4.23) 

k—too,k£K 

Invoking again the strong convexity of h a w( ■ ,2 (fe) ), we obtain 

y lk (fe) - V {k) II 2 < K w (xM,xW) - h aW (y^,x^) = -h aW (y( k \xM) 

with, together with (4.23) gives limfc-^fcgif II z/ fe) — x^ fc ^|| 2 = 0. Thus, y = x and by Proposition 3.3, we 
conclude that x is stationary. □ 


4.4. Remarks. Different notions of inexactness have been proposed in the literature (see [31, 33] and 
references therein), especially in the context of proximal point methods, with the aim of approximating the 
resolvent operator, and some of them could be considered also in our framework. A synthetic description 
of possible inexactness notions and their relationships is given in Figure 4.1. 

It is difficult to insert the inexactness criterion (4.19) in the scheme in Figure 4.1, since the shape of 
P v in (4.19) depends on x, while the implications in Figure 4.1 are independent of x. 

In general, we observe that from inequality (4.22) and by definition of e-subdifferential we have 

0 with t k = (rj - l)h aW {y (k \x^ k) ). 

We give a pictorial example of the sets of admissible approximations y of the exact minimizer y defined 
by conditions (4.19) and (4.15) in Figure 4.2. This example refers to the case where fi(x) = lq(x) is 
the indicator function of a convex closed set S2 C R”. Choosing the Euclidean metric, i.e. (3.1) with 
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Fig. 4.1. Connection of different inexactness notions, under the assumption (3.1). The proof of the implications are 
given in Lemma J^.2 and in [31, Proposition 1]. 




Fig. 4.2. Example with fi(x) = lq(x), d a- as in (3.1) with a = 1, D = I. Left panel: in yellow, the set P r] (x;h CT ) 
defined in (4.18). Right panel: in yellow, the set of points y satisfying (4.15). 


D = /, a = 1, as distance function, the operator p(x\ h a ) reduces to the Euclidean projection of the point 
z = x — V fo(x) onto 0. Moreover, condition (4.15) becomes 

y £ and (w — y) T (z — y) < e . Vw e f2. (4.24) 

As well explained in [31, 33], from a geometrical point of view, a point y e fl satisfies (4.24) if and only if O. 

is contained in the negative half-space determined by the hyperplane of equation (w—y) T (z — y)/\\z—y\\ = 
e/\\z — 2/||, which is normal to z — y at a distance e/|| z — y|| from y. 

On the other side, setting 7 = 1 for simplicity, we have h atl { •,x)=h cr (-,x) = ^|| ■ —z\\ 2 ~^\\x — 

z \\ 2 + ifi( • ) — 1 n(x). Thus, the set P v (x ; h a ) is the intersection of the set fl with the ball centered in 2 
of radius yjy\\y — z\\ 2 + (\ — rf)\\x — z\\ 2 . 

In general, one of the main differences between definitions (4.20) and (4.17) consists in the fact that 
in the latter case the distance between the approximated and the exact minimum of h a (k)( ■ ,x^), i.e. 
\\y^-y (k) l can be controlled by the independent parameter e*,, while in the other case this distance is 
algorithm and iteration dependent. This fact can be exploited to obtain a stronger convergence result, 
as shown in the next section. 

4.5. Convergence analysis in the convex case with e-approximations. 

4.5.1. Convergence. In this section, we assume that fo is convex and, in this case, we prove a 
stronger convergence result for a specific line-search algorithm where the descent direction is defined by 
means of an e-approximation, provided that the sequence of parameters {efcjfcgN is summable and that 
the sequence of the matrices D & satisfies suitable assumptions. The following theorem is a generalization 
of Theorem 3.1 in [9]. Further results on forward-backward variable metric algorithms which apply to 
problems of the form (1.1) when /o has Lipschitz continuous gradient can be found in the recent papers 
[14, 17]. We stress that in all our analysis we do not need any Lipschitz continuity of the gradient of /o 
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and, moreover, the sequence of errors \\y^ — ?/ fe )|| needs to be square summable, while the convergence 
result stated in [14, Theorem 4.1] is given under the stronger assumption that ||— y^ || is summable. 


Theorem 4.3 Let 0 < a min < a max , 7 G [0,1], {afcj-fcgN C [aw, a max ]. Assume that f 0 in (1.1) is 
convex and the solution set X* of problem (1.1) is not empty. Let {a;^}fcGN be the sequence generated as 

x {k+1) = x {k) + A (fc) d (fe) , d {k) = y {k) - x {k) 

where A ^ is obtained by means of the backtracking procedure in Algorithm LS, with y^ satisfying 
(: y (k) ,x^) < 0. Moreover assume that: 

(HI) ysatisfies (4.17), where the sequence {efcjfceN is summable , i.e. e k < 00; 

(H2) {Dk}ke n C A4 M , where y > 1 and 

OO 

L>k +1 d (1 + Ck)L>k, {CfcjfeeN C R>o, and ^ f k < 00. 

k =0 

Then the sequence {x^}ken converges to a solution of (1.1). 

Proof. First of all we recall the basic norm equality 

||« - b\\l + ||6 - c\\l - ||« - cf D = 2(o - b) T D{c - b ) (4.25) 

which holds for any a, 6, c G R". Let x G X*. By definition of y w we have 

fi(w) > /i(y (fc) ) + —(z (fc) - y (fc) ) T D k (w - y (fc) ) - e* Vu> G R" 


which, recalling that = x ^ — a k D k 1 X fo{x^), writes also as 

(y (fc) - x {k) ) T D k {w - y w ) > a k (7i(i/ (fc) ) - fi(w) + V/ 0 (a; (fc) ) T (i/ (fc) - w)) - a k e k Vw G R". 

For w = x, the previous inequality gives 

(y (fc) - x^ k) ) T D k (x - x (fc) ) > a k (/i(y (fc) ) - h(x) + Xf 0 (x {k) ) T {x^ - x)) - a k e k + 

+ ( y {k) ~ x (fe) + akD^Xfoix^Y D k (yW - x w ) 

> a k (/r (£<*>) - fM k) ) + f{* {k) ) ~ fix)) + || y^ - x ^f Dk (4.26) 

+ a k Xfo(x^) T (y (k) - x (k) ) - * k e k 

> ||y (fc) — a; (fc) ||^ fc - a k e k 

+ a k (/1 (y (fc) ) - /i(i<‘>) + Xf 0 (xW) T (yW - X W)) 

= l|a;(fe+1) - x(k) 11 ^ “ ak€k (4 - 27) 
+ a k (A(y (fc) ) - /i(sW) + Xf 0 (xW) T (yW - X W)) , 

where the second inequality is obtained adding and subtracting fi(x^) and by the convexity of /o, the 
third one from the fact that x is a minimum point and the last one by definition of x^ k+l i. By equality 
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(4.25) with a = x^ k+1 \ b = x^ k \ c = x, D = D k we obtain 


|x( fe+1 ) - x||^ 


= \\x^ - x\\ 2 Dk + ||a;( fc+1 > - i<‘) - 2{x^ - x^ k+1 ^) T D k (x^ - x) 

= llx^ - 


( 4 . 27 ) 

< 


\x^~x\\ 2 Dk 


s || 2 Dk + ||x (fc+1) - x^f Dk - 2A( fe )(y (fe) - x^) T D k {x - xW) 
2 ^ llrr( fc + 1 ) - a;( fc )ll 2 - 


1 - 


\W I 


I D k 


= ll-r( fc ) - 


2a fe A (fc) (v/o(i (fc) ) T (s W - *<*>) + h{y {k) ) - h{x {k) )) + 2a k \^e k 

' — —--1— 


I D k 


+ 1 


A( fc ) A «) 


E (k+i) _ a?( fc ) ||^ fe + 


-2a fc AW^ Wt7 (jf( fc ),a:( fc )) + 2a fc AWefc 
< ||a; (fc) -x|||, fe - 2a fc A (fc) h 0 .(fc) )7 (y (fc) ,x (fc) ) + 2a fc A (fc) e/c, 


(4.28) 


where the third equality is obtained by adding and subtracting the term 7A ^ || y^—x^\\ 2 Dk = 7/A^ || a; ( fe + 1 ) _ 
oA^IId,. and the last inequality follows from the fact that 7 € [0,1]. From assumption (H2) we obtain 


ik (fc+1) -*il 2 D fe+1 <(i+a)ik (fc+1) -^nk 

< (1 + Cfc)l|£ (fe) - xf Dk - 2a fc (l + Cfc)A (l) i W , 7 fe (i) ,/ ) ) 

+2afeA*- fc ^ (1 + C/c)efe 

< (1 + Cfc)lk (fe} -^llk -2a m axCA (fe) h CT(fe , i7 (yW,a ; W) + 2a max Ce fc (4.29) 

where we set £ = 1 + max^^fe. Then, from [25, Lemma 2.2.2] we can conclude that the sequence 
{||a;( fe ) — i|ll> fc }fceN converges. In particular, since D k £ A4^, {x^jkgN is bounded and, thus, it has 
at least one limit point. Let us denote such limit point by x°°. By Corollary 4.1, x°° is stationary; 
in particular, since / is convex, it is a minimum point, i.e. x°° € X* and, thus, {||aA fc ) — a; 00 |li) fc }feeN 
converges. Let {x( fei )}ieN be a subsequence of {x^jfcgN which converges to x°°. By the norm inequality 
(1.5) we can write 


x v — x 


I D k . 


< n\\x {ki) -z°°|| 


Since — x°°|| 2 D k }keN converges, this implies that its limit is zero. Invoking again (1.5) we can write 


-I 

h 


±|| x (fe) _ a: 00 !! 2 < \\ x ( k ) - x co \\ 2 Dk k ^¥ 0 


which allows to conclude that {x^} k en converges to x°°. 


□ 


In the following we present a variation of Theorem 4.3 where the tolerance parameters e k are adaptively 
chosen, instead of being a prefixed summable sequence. 


Theorem 4.4 Let 0 < a min < a max , 7 € [0,1], {a/c}fceN C [a min ,a max ]. Assume that f 0 in (1.1) is 
convex and the solution set X* of problem (1.1) is not empty. Let (x^jfceN be the sequence generated as 

x {k+1) = x {k) + A (fc) d (fe) , d {k) = y {k) - x {k) 

where A ^ is obtained by means of the backtracking procedure in Algorithm LS, with y^ satisfying 
h a (k) t 7 (: y (k) ,x^) < 0. Moreover assume that 
(HI’) satisfies (4.17), where the sequence {efc}fceN satisfies 

e k < -Th aWri [y {k \x (k) ) (4.30) 


for some r > 0, 
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and that hypothesis (H2) of Theorem f.3 holds. Then, the sequence converges to a solution of 

( 1 . 1 ). 

Proof. By substituting (4.30) in (4.29) we obtain 

||x (fc+1) -x \\ 2 Dk+1 < (1 + Cfe)lk (fc) -x\\ 2 Dh -2a max C(l + r)A (fc) /i CTWi7 (y (fe) ,a; (fc) ). 

The rest of the proof follows exactly from the same arguments employed in Theorem 4.3. □ 


We will show in Section 5.4 how the conditions (4.17) and (4.30) can be satisfied in practice. 

Assumption (H2) is analogous to the one proposed in [14, 17]. A special case of it consists in the 
following 

OO 

(H2’) {D fc } fceN C M^ k , where = 1 + £ k , f k > 0, ^ < oo. 

k =0 

Thanks to the inequality (1.5), for any x £ R” we have 

INI 2 

x (D k -(_i y k fx k ^-iD}z)x x D k +ix y k y k - k i_x D k x A /Xfc-|-i||x|| fj, k y k +i — 0, 

Hk 

which implies D k+ 1 A y k y k+ iD k . Moreover, y k y, k+ i can be written as y k y k + 1 = 1 + ( k , where f k = 
\/(l +Cfe)(l + £fc+i) - 1- Since lim x ^. 0 y/T+lc/x = 1/2, it follows that jf,kLo an d Cfc have the 
same behaviour. Then, we can conclude that (H2’) implies (H2). 

We also observe that, employing the same arguments above, we can also prove that y k +iy k D k+ i >z 
D k , and, as a consequence, (H2’) also implies that (1 + f k )D k +i h D k with YlkL o Cfc < oo. 

In practice, (H2’) says that the scaling matrices have to converge to the identity matrix at a certain rate, 
while (H2) implies the convergence to some symmetric positive definite matrix (see Lemma 2.3 in [17]). 

4.5.2. Convergence rate analysis. In this section we analyze the convergence rate of the objective 
function values /( x ^) to the optimal one, /*, proving that /(a/ fc+1) ) — f* = O(^). This complexity 
result is obtained in the same settings of Theorem 4.4, but further assuming that the gradient of /o 
is Lipschitz continuous on the domain of f\. This Lipschitz assumption guarantees that the sequence 
is bounded away from zero. Before giving the main results, we need to prove the following 
lemma, which actually does not require the Lipschitz assumption. 


Lemma 4.3 Let x^ k \y^ £ Q. If satisfies (4.17), with 0 < a k < a max and D k £ A4 P , then, 

1 ||yW - ||2 < -h al »JyW,xW) + e k . 


2o max /i 


(4.31) 


Proof. For any w £ d tk f\{y^ k ' > ) we have 

h aW (yM,xW) = V/ 0 (*W) r (jjW - a-W) + ±-\\yW - x ^\\l k + /i(y<*>) - fi(x <*>) 

< Vf 0 (x^) T (y {k) - * (fe) ) + ^Hl y (k) - xM\\ 2 Dk + w T (yW - X W) + ch¬ 
in particular, the previous inequality holds true for w = D k (z^ — y^) (see (4.17)). This results in 
h aW ^ k \x^)<h aW (y^ k \x^) 

< v fo (xW) T (y {k) - x (k) ) + ^-11 y (k) - z (fc) llk + 

2 a k 


2 cy k 

where the last inequality follows from (1.5). 


+ — {x {k) - a k D^Vf 0 (xW) - y w ) T D k (yW - *(*>) + £fc 
a k 

-II y {k) x ^\\ 2 + e fe < -II yW - />|| 2 + e*, 




□ 
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Proposition 4.2 Let be a sequence of points in LI and a sequence of descent direc¬ 
tions such that d^ = y^ — a/ fe ) and (4.31) holds. Let be the steplength sequence computed 

by Algorithm LS and assume that V/o is Lipschitz continuous on LI and that (4.30) holds. Then, there 
exists A m i n G ®>o such that 


A (fe) > A min Vfc G N. (4.32) 

Proof. In view of (4.30)-(4.31), setting a = a max y, one obtains 

\\d^\\ 2 <-2a{l + r)h^ K ^ k \x^). (4.33) 

If V/o is Lipschitz continuous on Ll with Lipschitz constant L , then from the descent lemma [6, p.667] 
we have 


/o(:c (fc) +Ad< fc >) < fo(x w ) + \X7f 0 (x^) T d {k) + ^\ 2 \\dW\\ 2 , (4.34) 

where A G [0,1]. By combining inequalities (4.33) and (4.34) we further obtain 

fo(x (k) + A d^) < f 0 (x (fc) ) + AV/ 0 (® w ) r dW - a(l + r)LX 2 h a(k)n (y^ ,x^). 

Summing f0x^ + A d^) on both sides of the previous relation and applying the Jensen inequality 
/i(aA fc ) + A d^) < (1 — X)fi(x^) + Xfi(y < ' k ' ) ) to the r.h.s. yields 

/(x (fc) + ArfW) < /(*<*>) - A/r(®W) + A/r (jf<*>) + AV/o(z (fc) ) T d< fc > 

—aLX 2 (l + r)h CT (fe) i7 (i/ (fc) ,a; (fc) ) 

< f(x {k) ) - A/i (*<*>) + A/r(jfW) + XVf 0 (x^) T d^ 

-aLA 2 (l +T)h aW ^{y (k \x^ k) ) + ||d (fc) |||, fc 

= f(x [k) ) + Xh aWtl (y^ k \x {k) ) - aLX 2 {l+T)h^ k) l {y (k) ,x (k) ) 

= f(xW) + A (1 - aL{ 1 + r)A) h„ w „(yW ,x™). 

The previous inequality ensures that the Armijo condition 

f(x w + Ad (fc) ) < f(xW)+X0h a „(y( k \xW) (4.35) 

is satisfied, for all k G N, when 1 — aL(l + r)A > /3, that is for all A such that A < (1 — 0)/(aL( 1 + r)). If 
A( fe ) is the steplength computed by Algorithm LS and the backtracking loop is performed at least once, 
then A = A ^/S does not satisfy inequality (4.35), which means A ^ > (1 — (3)S/(aL(l + r)). Thus, the 
steplength sequence {X^}ken satisfies inequality (4.32) with A m j n = (1 — /3)S/(aL( 1 + r)). □ 

Based on these premises, we are now ready to prove the convergence rate result. 

Theorem 4.5 Assume that the hypotheses of Theorem /./ hold and, in addition, that the gradient of fo 
is Lipschitz continuous on Ll. Let f* be the optimal function value for problem (1.1). Then, we have 

f(x (k+1) )-f* =0 

Proof. If we do not neglect the term f(x — f(x) = f(x^) — /* in (4.26) and in all the subsequent 
inequalities, instead of (4.28) we obtain 

|| a .(fe+ 1) -xf Dk < ||a;( fe ) -x\\ 2 Dk +2a fc A (fc) (~Kw x^) + e fe ) - 2A {k) a k (f(x ik) ) - f*), 
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and hence: 


k (fc+1) -£H D k+1 < a + Ck)\\x^~x\\ 2 Dk 

< (1 + Cfe)lk (fc) — x\\ 2 Dk + 2afeA (fc - > (1 + (k){—h a <,k) tl (y( k \x( k ' > ) + e^) + 

-2A< fc >(l + {k)a k (f(xW)-n 


(4.30) 


< (1 + Cfc)l|a: (fc) - x\\ 2 Dk - 2a max (l + r)CA (fe) h aW >7 (j/ Cfc) ,x (fe) ) + a(f* - /( x (k) )), 

where we set ( = 1 + max/,, (fc, a = 2A m i n a m i n , where A m i n is defined in Proposition 4.2. Summing the 
previous inequality from 0 to k gives 

k k 

lk (fc+1) -z|| D k+1 < ll* (0) — *llz?o +H0ll* w -xWdj - 2a max (l +t)CY^ ^ b) Ku), 1 (y b) ,x ( - j) ) + 


3=0 


3=0 


+a [ (k + 1)/* - /( x(j) ) 


3=0 


: ||x w — x\ 


2 Do +M(- 2a -x(l+rK (/(a;(0)) _ r) + a ( {k + 1)r _ J- f(x&) | , 


3=0 


where the second inequality follows by setting ( = from the fact that {||x( fc ) — }fceN is a 

convergent sequence (see Theorem 4.4), thus there exists M such that ||a;w') — < M, and from (4.9). 

Adding the positive quantity a(f(x^) — /*) to the right hand side of the last inequality we obtain 


iz“ +1 > - 4iD t+1 < + mc, - _ n+a I kf-- E /(*“) 


3=1 


Moreover, exploiting the inequality 

k 


0 < 'EMxO)) _ /(a;(i +1 ))) = ^/(x^) - fc/(x (fe+1) ) 

3=0 j =1 


gives 

\\x (k+1) - x\\l k+1 < ||X (0) - x\\ 2 Do + M( - 2amax( /3 1 + r)C (/( ^(0) ) - n + ak(r - f(x (fc+1) )). 

Rearranging terms, this finally yields 

f{x [k+1) ) ~ m < ^ (||* {0) - x\\d 0 + M( - 2 amax( | j +r)C (/(x(°)) - /(x))) , 

establishing the result. □ 

5. Practical computation of 77- and e- approximations. 

5.1. Computing 77-approximations. In this section we discuss how to compute a point y^ such 
that (4.20) holds, i.e. satisfying 

^)(j W ,i W ) <vh a w(y lk) ,xW), with yW =p( X W;h a ), 


(5.1) 
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for a given rj £ (0,1], without knowing y( k >. A special case of this problem, corresponding to the case 
fi = t-n, where D is the intersection of closed, convex sets and the metric is given by (3.1), is considered 
in [7]. This is possible when, for each k, one can compute a sequence {a;}; e N C M such that 

<M < \/l £ N, and lim a t = h a {y ( ' k \ z (fc) ), (5.2) 

l—too 

and a sequence of points N suc h that 

lim h aW {y^ l \x^) = h^»{yW,xW). (5.3) 

l—too 

In practice, l should be considered as the index of an inner loop for computing . Indeed, when (5.2) 
holds, we also have 


yai < rjh a (k) (■ y {k) ,x {k) ) VZgN. (5.4) 

Moreover, for all sufficiently large l we have ai > h a (k) (y( k \ x^)/rj which, together with (5.4) gives 

K w {y^\x^) < na, < yh am {y^\ X ^). 

Then, if one considers any method generating a sequence y( k ’ l '> such that (5.3) holds, the stopping criterion 

Kw (y (M) ,x (fc) ) < rjai (5.5) 

for the inner iterations is well defined. If l is the smallest integer such that (5.5) is satisfied, then the point 
y( k ) = y( k > 1 ) satisfies (5.1). In the following sections we show how to compute a sequence ai satisfying 
(5.2) in an interesting case. 

5.2. Composition with a linear operator. In this section we assume that fi(x) is given by 

fi(x) = g{Ax), (5.6) 

where A £ anc [ g . gm ]j| - g a convex function. Moreover, we choose d a as in (3.1). Let us 

consider the minimum problem (3.4) which can be written in equivalent primal-dual and dual form as 

min h„(k-)(y, x^) = min max F„{h){y,v,x^) = max \I> (fc ) (v, x^). 
y€R n y£ R" u£R m 

The primal-dual problem can be obtained from the primal one by applying Definition 2.4 of the convex 
conjugate, which gives g{Ax) = max„ 6 R»> v T Ax — g*(v), obtaining 

F am (y,v,x ik) ) = -^-\\y - z (fc) |||> fc +y T A T v- g*{v) - fi(x {k) ) - ^-|| V/ 0 (x (fe) )||^i (5.7) 

with z^ = X ( k ) — akDZvfo{x^). The dual problem is obtained by computing the minimum of the 
primal-dual function with respect to y, which is given by y = z^ — akD^ 1 A T v, and substituting it in 
(5.7), obtaining the explicit expression of the dual function 

t«(v w ) = d^a t v - \\i h -g*( v ) - fl ( x W) - ^||V/ 0 (*W)||^_1 + ^- k \\z (k) \\k- 

By dehnition of the primal-dual and dual functions, the following inequalities hold 

h a m(y,x {k) ) > F aW (y,v,x {k) ) > ^ aW (v,x^) Vy £R n ,v £ K m . 

In particular, the previous inequality holds for y = y^. Then, an approximation y ^ of y^ can be 
computed by applying any method to the dual problem 

max 4* (*) ( v,x ^), 
weR m 


(5.8) 
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generating a sequence {u®}; g N such that 5 1 ^k) (v^ l \ x^) converges to the maximum of the dual function 
4^(0 ( • ,x^). As a consequence of this, setting y( k ’ l '> = z^ — a k D k l A T v^ l \ a point satisfying (5.1) can 
be found by stopping the dual iterations when 

h aW ( y {k ’ l) , x (fc) ) < ( v (I) , x {k) ) (5.9) 

is satisfied, i.e. (5.5) with ai = 

For example, one can apply a forward-backward method [16], called also ISTA or its accelerated 
version (FISTA, [5]) to the dual problem. As an alternative, also the saddle point problem 

min max F„tk) (y, v, x^) 

9 6 i”»eE m 

can be faced, for example with a primal-dual method such as [12, 24], using (5.9) as stopping condition. 
More in general, a point y^ £ P ri (x ^; h a (k )) can be obtained by computing two sequences, {i/^}; g pj, 
n, such that 

lim (v®, x^) = max *F (k) (v, x^) = min h (k)(y,x^) = lim h ^(y^’^jX^), 

l—too u£]R m y£R n l—too 

stopping the iterates when (5.9) is met. 

Remarks. We observe that (5.6) includes also the case where fi(x) is defined as fi(x) = 9 i{AiX), 
where Ai £ R miXn , g i \ R mi —» R. Indeed, formulation (5.6) is recovered by setting A = [Af Aff ... Af] T £ 
R mx ” with m = i m i- In this case the dual variable v can be partitioned as v = [vf v% ... vf ] T , 

where Vi £ M mi and g*(v) = Y^=i9i( v i) (see [35, Theorem 2.3.1 (iv)]). 

5.3. Preserving feasibility. Clearly, any point y( k ' l '> satisfying (5.9), where is generated by any 
converging algorithm applied to the dual or the primal-dual problem, belongs to the domain of h a (-,x^), 
i.e. to the set f2. Indeed, for any l, u® belongs to the domain of the dual function (fe)(-,a;( fc )) and, 
as a consequence, (5.5) implies that h a m(y^ k ’ l \x^) is finite. However, the stopping criterion (5.5) may 
require a very large number of inner iterations l to be satisfied, and, in addition, the primal sequence 
points y( k ’ l ' ) may be feasible only in the limit. For these reasons, we propose to consider also the sequence 
y(k,i) _ p n (y( k P^ where Pn denotes the Euclidean projection onto the set fi. If, at some inner iteration 
l, the inequality 


Kwfi (y (M) ,z (fc) ) < r)'£ lT m (v (l) , x (k) ) (5.10) 

is satisfied, this clearly means that y( k ’ 1 ) £ P rj (x < ' k ^; h a ) (i.e., (4.20) is satisfied) and we can set y^ =y( k P. 
We observe that, when y( k P converges to y^ as l diverges, the stopping criterion (5.10) is well defined, 
since y^P also converges to y^ k \ 

5.4. Computing e-approximations. In this section we show how to compute a point satisfying 
inclusion (4.17), for any given e k £ R> 0 , when the convex function fi in (1.1) has the form (5.6). Our 
arguments are obtained by extending those in [33], which are recovered setting D k = I. As done in 
Section 5.2, we will make use of the duality theory. In particular, we define the primal-dual gap function 
as 


Gawiy^ViX^) = /i CT ( fe) (y,x (fc) ) - 'F^(fe) (u, x (fc) ). (5.11) 

We also have the following result. 

Proposition 5.1 Let e k £ R>o- If 


Gaw(,y {k) ,v,x w ) < e k , 

with yW = z — a k D k l A T v, for some v £ R m , then (4.17) is satisfied. 


(5.12) 
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Proof. From the definition of the primal-dual gap, a simple computation shows that 

G a w(y^ k \v,x^) = — \\a k D~ 1 A T v\\ 2 Dk - v T Az ( - k) + - a k D~ 1 A r v) + g*(v) 

®-k 

= sup — \\a k D^ x A t v\\ 2 d - v T Az ( ' k ' ) + w T (z ( ' k ' ) - a k D^A T v) - f*{w) + g*(v) 

w£R m &k 

= sup (w — A t v) t (z ^ — akD^ 1 A T v) - fi(w) + g*(v) 
weR m 

> sup (w - A t v) t (z (/c) - a k D^ 1 A r v) - f*{w) + f{(A T v ), 

where the last inequality follows from Proposition 2.1. Thus, if (5.12) holds, the previous inequality 
yields 

(w - A T v) T {z (k) - a k D^ 1 A T v) - f*(w) + f^(A T v) < Q a o.) (f/ (fc) , v, ir (fc) ) < e k R m . 

Rearranging terms, the previous inequality writes also as 

f*(w) > ff(A T v) + (w - R T u) T (z (fc) - a k D^ 1 A T v) - e k \/w £ R m 

which, from definition (2.4), is equivalent to — a k D^ 1 A T v £ d ek ff(A T v). Finally, by applying Propo¬ 
sition 2.2, we obtain A T v £ d €k fi(z^ — a k Df 1 A T v). Recalling that y^ = z^ — a k D^ 1 A T v, which 
implies A T v = D k (z^ — y^)/a k , (4.17) follows. □ 


The previous result suggests that for computing y^ satisfying the assumptions of Corollary 4.1 we 
can use the same iterative approaches described at the end of Section 5.1, stopping the iterates when 

&(*)(y {fc, V (, W fc) )<efc and h ffWj7 (y( fc ’^>) < 0. (5.13) 

5.5. Equivalence between // and e approximations. Any ^-approximation y lk> satisfying (5.9) 
for some £ R m is also an e-approximation, where e = —rh^k) (y^ k \ x^) and r = —1 + 1/rj. In fact, 
in these settings, (5.9) implies h f 7 (k) (y( k \x^) — (v®, x^) < —Th a (js){y^,x^) and, as shown in 

Section 5.4, this means that y^ is an e-approximation with e = —Th a (u)(y^l,i®). Thus, any point 
computed by an iterative procedure stopped when (5.9) is satisfied, is both an y- and e- approximation. 

6. Numerical illustration. In order to validate the proposed approach, we consider a relevant 
image restoration problem, whose variational formulation consists in minimizing the sum of a discrepancy 
functional plus a regularization term. Following the Bayesian paradigm, when the noise affecting the data 
is of Poisson type, a typical choice for measuring the discrepancy of a given image x from the observed 
data b is the following Kullback-Leibler divergence 

n ( b \ 

KL{x, b) = ^2 bi log ( — ) + Xi - bi. 

i =i 

Taking into account also the distortion due to the image acquisition system, which we assume to be 
modeled through a linear operator H £ R raxn , and a constant background term bg , the data discrepancy 
is defined as 


fo{x) = KL(Hx + bgt,b), 

where 1 £ R” is the vector of all ones. Moreover, when one wants to preserve edges in the restored image 
and also the non-negativity of the pixels values, the regularization term can be chosen as 

n 

h(x) =pY^ ii v ^ii+^> o ( x )> 

i= 1 
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problem 

ref. image 

size 

range 

O' psf 

bg 

p 

cameraman 

Matlab cameraman 

256^ 

[0,1000] 

1.4 

5 

0.0091 

micro 

[34, Figure 8] 

128 2 

[1,69] 

3.2 

0.5 

0.09 

phantom 

Shepp-Logan phantom 

256 2 

[0,1000] 

1.4 

10 

0.004 


Table 6.1 

Test problems description 


where p £ K>o is a regularization parameter multiplying the total variation functional [30] and V, £ M 2x " 
represents the discrete gradient operator at the pixel i. Clearly, the function f\(x) has the form (5.6), 
with A = (Vf • • • /) T £ R 3nx ". In this case v £ R 3n and g* is the indicator function of the set 

B l P x • • • x B l P x R< 0 > where Bq C R 2 is the 2-dimensional Euclidean ball centered in 0 with radius p. 

In our experiments we assume that H corresponds to a convolution operator associated to a Gaus¬ 
sian kernel, with reflective boundary conditions, so that the matrix-vector products involving H can be 
performed via the Discrete Cosine Transform [22]. 

We define a set of test problems in the following way: a reference image has been rescaled so that the 
pixel values lie in a specified range (this is for simulating different noise levels), then it has been blurred 
by convolution with a Gaussian kernel with standard deviation er ps f and the background has been added. 
Finally, Poisson noise has been simulated with the Matlab imnoise function, obtaining the noisy blurred 
image b. The details of each test problem are listed in Table 6.1. The regularization parameter p has been 
manually tuned to obtain a visually satisfactory solution. For each test problem we numerically compute 
the optimal value /* by running the considered algorithms for a huge number of iterations, retaining the 
smallest value found. We implement our inexact algorithm, which is summarized in Algorithm VMILA, 


Algorithm VMILA Variable Metric Inexact Line-search Algorithm (VMILA) 

Choose 0 < a min < a max , p > 1, <5, /3 £ (0,1), 7 £ [0,1], 77 £ (0,1], x (0) £ Q. 

For k = 0,1, 2,... 

1. Choose Ofc £ [o m i n , o max ], 1 ^ g k — P and D^ 

2. Compute y^: compute a dual vector £ R m and the corresponding primal vector y( k ’ l > such 
that (5.9) is satisfied, then set = y( k ’ l \ 

3. Set d (fe) = y( fc )-a;W; 

4. Compute the steplength parameter A ^ with Algorithm LS; 

5. Set x^ k+v > =xW +A»)d»). 


in Matlab environment with the following settings: 

Step 1, metric selection: the scaling matrix Dk is chosen mimicking the split-gradient idea [23]. In 
particular, at each outer iteration it is defined as the diagonal matrix with positive entries as follows 

( ( x (k) 

\Dk)u = max mm I ^ _ , p k 

where p k = i/l + 10 lo /fc 2 , so that assumption (H2’) is satisfied. We choose a large initial range for the 
scaling matrix selection to allow more freedom of choice at the first iterates, where the benefits of the 
scaling matrix are more relevant [8]. 

Step 1, steplength selection: the parameter a k is chosen by the same strategy used e.g. in [10, 28, 27], 
and its value is constrained in the interval [a m i n ,a max ] with a m i n = 10 —5 , a max = 10 2 . 

Step 2, computation of the approximated proximal point y^: we experienced different inner solvers 
applied on the primal-dual or on the dual formulation of the inner problem. The best performances 
have been obtained choosing FISTA applied to the dual problem (5.8), in the variant proposed in [11] 
which ensures the convergence not only of the objective function values to the optimal one but also of the 
iterates to the minimum point. In particular, we set ti = (l + a — l)/2, with a = 2.1 in [11, formula (5)]. 
For brevity, in the following, we report only the results obtained stopping the inner iterates when criterion 
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(5.10) is met, which corresponds to both an rj and e approximation (see section 5.5). A maximum number 
of 1500 inner iterations is also imposed. The initial guess of the inner loop at the first outer iterate is 
the vector of all zeros, while at all successive iterates the inner solver is initialized with the dual solution 
computed at the previous iterate. 

Other parameters setting: the line-search parameters S , /3, 7 have been set respectively equal to 
0.5,10“ 4 ,1. 

All the following results have been obtained on a PC equipped by an Intel Core i7-2620M processor 
with CPU at 2.70GHz and 8GB of RAM, running Windows 7 OS and MATLAB Version 7 (R2010b). 

We investigate first the impact of the inexactness parameter rj choice on the overall method. In 
Figure 6.1 the relative decrease of the objective function values in the first 500 iterates is reported with 
respect to both the iteration number (first row) and the computational time, in seconds (second row). It 
can be observed that a higher precision can accelerate the progress toward the solution, but this usually 
results in a very large number of inner iterations and, consequently, it is extremely time consuming 
(for example, for the test problem cameraman with r/ = 10 _6 ,10~ 2 ,5 ■ 10 -1 the mean number of inner 
iterations per outer iteration is 28, 54, 409, respectively). This is typical of inexact algorithms based on 
the iterative solution of an inner subproblcm. We find that a good balance between convergence speed 
and computational cost is obtained by allowing a relatively large tolerance, corresponding to g = 10~ 6 . 

As further benchmark, we compare our algorithm to a well established state-of-the-art method, 
the Chambolle and Pock’s method (CP) [12], which, referring to the notations used in their paper, 
has been implemented setting G(x ) = iR" 0 (x) and F{Kx) = KL(Hx + bg,b) + /?)C”= 1 ||Vjir||, with 

K = (H T , Vf, • • • , V^) T . In this way the resolvent operator associated to F* can be computed in closed 
form. In Figure 6.2, we compare the behaviour of our approach (with g = 10 -6 ) with CP (2000 iterations) 
for different choices of its two parameters, a and r (once r is selected, a is chosen such that rerL 2 = 1, 
where L = ||AT||). We can observe that CP is quite sensitive to these parameters, and it is difficult to 
devise, in general, the more convenient choice, while our approach with the parameters settings described 
above seems to be always comparable to the best results obtained by CP in terms of objective function 
decrease with respect to both the iteration number and the computational time. 

7. Conclusions and future work. In this paper we presented and analyzed an inexact variable 
metric forward-backward method based on an Armijo-typc line-search along a suitable descent direction. 
The inexactness of the method relies in the possibility of using an approximation of the proximal operator, 
while the underlying metric may change at each iterations and also non Euclidean metrics are allowed. 
We performed the convergence analysis of the method, obtaining results in both the nonconvex and 
convex cases and providing also a convergence rate estimate in the latter one. The main strengths of the 
method are listed below. 

• The convergence is ensured by a line-search procedure, which does not depend on any user 
supplied parameter (actually the constants 7, j3, 6 have to be chosen, but the behaviour of the 
whole algorithm is not sensitive to these choices). On the other side, the “free” parameter a in 
(3.4) could be exploited to accelerate the convergence speed. 

• The possibility of using at each iterate an approximation of p(x^ ; h a ) makes the method well 
suited for the solution of a wide variety of structured problems. 

• The numerical results on a large scale convex problems shows that the performances of the inexact 
method are promising and comparable with those of a state-of-the-art method. 

Future work will be addressed especially to deepen the theoretical and numerical analysis in the nonconvex 
case, investigating the possibility to obtain convergence results stronger than the ones stated in Theorems 
4.1 and 4.2, at least for some classes of nonconvex functions (e.g. Kurdyka-Lojasiewicz functions). 
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